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Abstract 

Publication statistics are ubiquitous in the ratings of scientific achievement, with citation counts 
and paper tallies factoring into an individual's consideration for postdoctoral positions, junior 
faculty, and tenure. Citation statistics are designed to quantify individual career achievement, 
both at the level of a single publication, and over an individual's entire career. While some 
academic careers are defined by a few significant papers (possibly out of many), other academic 
careers are defined by the cumulative contribution made by the author's publications to the body 
of science. Several metrics have been formulated to quantify an individual's publication career, yet 
none of these metrics account for the collaboration group size, and the time dependence of citation 
counts. In this paper we normalize publication metrics in order to achieve a universal framework 
for analyzing and comparing scientific achievement across both time and discipline. We study the 
publication careers of individual authors over the 50-year period 1958-2008 within six high-impact 
journals: CELL, the New England Journal of Medicine (NEJM), Nature, the Proceedings of the 
National Academy of Science (PNAS), Physical Review Letters (PRL), and Science. Using the 
normalized metrics (i) "citation shares" to quantify scientific success, and (ii) "paper shares" to 
quantify scientific productivity, we compare the career achievement of individual authors within 
each journal, where each journal represents a local arena for competition. We uncover quantifiable 
statistical regularity in the probability density function (pdf) of scientific achievement in all journals 
analyzed, which suggests that a fundamental driving force underlying scientific achievement is the 
competitive nature of scientific advancement. 
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I. INTRODUCTION 



The study of human success is difficult because information has traditionally been 
recorded for only the excellent, while individuals with lower than average careers are gener- 
ally neglected in the record books. Hence, drawing conclusions based only on the relatively 
few stellar careers will suffer to some extent from selection bias. In contrast, conclusions 
drawn from the entire population might better illustrate the mechanisms of success. While 
it is not feasible to obtain the career publication data for every scientist in every journal, it 
is possible to study a subset of scientists that succeed in publishing in a specific journal. 

Several empirical studies have analyzed the citation statistics of individual papers [lH5[, 
of individual journals/fields js-ll], and of subsets of individuals {5), U 14]. In this paper, we 
study the cumulative citation statistics of individual scientists over their publication careers 
within a given journal. Studying the distribution of career accomplishment in a particular 
journal serves as a proxy for the more difficult task of studying the citation statistics of all 
individuals in all journals, where such all-encompassing data are not as readily available. 
Here we adopt the working hypothesis that studying publications in high-impact journals 
offers crude approximation to an author's scientific contribution. 

We develop a simple method for normalizing citations so that they can be compared 
across time and discipline. In order to compare across time, we normalize the number of 
citations for each paper in a given year by the average number of citations to papers from the 
same publication year. This re-scaling can also aid in removing discipline-specific citation 
patterns that vary across discipline, especially when considering discipline-specific journals. 
We further remove discipline-specific collaboration patterns by dividing the achievement 
equally among the collaboration group members. 

This work aims to demonstrate the importance of properly normalizing any conceivable 
metric that quantifies career achievement (e.g. the citation count, h-index). Extending the 
work of Radicchi et al. js], which normalizes the citation values of single articles across 
discipline by re-scaling to local citation averages, our goal is to provide a framework for 
normalizing the scientific achievement of individual careers. The methodology developed in 
this paper should conceivably make possible the comparison of careers between various fields. 
Furthermore, we are able to study the mechanisms of human success in scientific arenas, 
where effective competition arises from limited financial, temporal, and creative resources. 
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In addition to studying the distribution of success and productivity, in this paper we 
also investigate the waiting time between successive achievements, which is intrinsically 
related to the underlying mechanism of progress. Recent work in 
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161 ] demonstrates 



that the Matthew effect (the "rich-get-richer" effect) can be quantified by analyzing the 
career longevity of employees within competitive professions, such as professional sports 
and academia. Here, we demonstrate the Matthew effect on the scale of individual authors 
by analyzing the time intervals between successive publications in high-impact journals. 
The Matthew effect 17j derives from a passage in the Gospel of Matthew and is a popular 



conceptual theory in sociology. This theory is analogous to several other positive feedback 
or cumulative advantage theories 
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19] which have been used to explain the ubiquity 



of right-skewed distributions that arise in socioeconomic studies. Of particular note, the 
generic preferential attachment mechanism is relevant to the dynamics of citations 
as well as the dynamics of human sexual networks 22 ] . 
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First, we briefly summarize several results which are relevant to the analysis of success 
and productivity performed in this paper. A seminal study performed over 50 years ago 
by W. Shockley studied the rate of productivity measured by the total number of 
publications and the total number of patents filed at several large research institutions. In 
this paper, Shockley suggests that normalizing metrics for output (e.g. patents, papers, 
citations) by the number of individuals could alleviate the discrepancy between disciplines. 
In this paper, we normalize metrics for output by the number of contributors such that they 
are weighted "shares", a procedure recently employed in the calculation of h-index values 

Q. 



Laherrere et al. 



121 ] analyze the top 1120 most-cited physicists over the sixteen-year 



period 1981-1997, with the result that the distribution of cumulative citation counts among 
these scientists, without any normalization procedure, is described by a stretched exponential 
pdf. 

Redner jlj] analyzes approximately 800,000 individual papers and found that the pdf P(x) 
of citations per paper x follows an approximate inverse-cubic power law. This result is found 
by analyzing the Zipf plot of the number of citations to a particular paper. Interestingly, we 
find that this result is maintained even after the normalization procedure developed in this 
paper. Redner Q also analyzes 110 years of citation statistics in Physical Review journals, 
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where he calculates the citation distribution of 353, 268 papers, and finds a log- normal pdf 
P(x) (without normalizing for publication time). In addition to the size and coverage of the 
Physical Review database, another impressive feature of this study is the analysis of citation 
dynamics, relating the citation growth rate to the number of contemporaneous citations. Of 
particular note, Redner finds an approximately linear citation (attachment) rate for citations 



originating from within Physical Review publications. Also, a recent study [10| analyzes the 
citation dynamics in 2,267 journals and finds that the time-dependent average number of 
citations per paper within each journal approaches a steady state value which can be used as 
a normalizing factor for comparing journals across discipline. In this paper, we use the time- 
dependent average number of citations per paper for a particular journal and publication 
year as the normali zing factor in order to compare articles across time and discipline. 



Recently, Hirsch [13J proposed the h-index to be an unbiased metric to quantify scientific 
impact. The h-index is calculated using the raw number of citations for each of an author's 
papers. Although simple in its definition, the h-index has encountered scrutiny, with the 
opposition claiming that the definition of the h-index is biased in that it neglects differences 
in publication patterns between scientific (sub) disciplines. It is further biased in that it 
neglects variations in the size of collaborations, and hence, the credit associated with a 
given publication. In Ref. [l^], Batista et al. normalize the h-index to the number of 
authors contributing to each paper in order to account for differences in publication styles 
across discipline. Two additional studies suggest that normalizing by the size of the field can 
alleviate the differences in research and publication style across disciplines [2, however 
the relationship between citation trends and field size are not trivial and depend on several 
factors ll|. Very recently, an enormous study of twenty- five million papers by Wallace et 
al. [4 1 implements Tsallis statistics to investigate the distribution of citations for papers 
spanning the 106-year period 1900-2006. The extensive analysis in 4j also discusses the 
changes in citation trends over time and the "uncited" phenomena. 

Our main result is to provide the first study that quantifies the career publication statistics 
of individual authors while normalizing the publication statistics with respect to 2 factors: 

(i) the number of authors credited for a particular publication, 

(ii) the time-dependent increase of citation counts. 

Specifically, we account for (i) by normalizing citation counts and paper tallies to the number 



of contributing authors, and (ii) by normalizing citations by the local average number of 
citations per paper; the local average is computed from the set of papers published in 
the same journal in the same year. While these two factors have been discussed, to our 
knowledge, no study has incorporated them both simultaneously. 

In order to compare careers that are of similar duration, we use methods described in 



24l | to isolate "completed" careers from our journal data bases, which all span the 50-year 
period 1958-2008 except for CELL which was created only in 1974. For the subset of careers 
that meet a completion criterion, we normalize each individual citation according to factors 
(i) and (ii). We then tally the normalized citation shares for each scientist, which serves as 
one possible metric for career accomplishment. We also perform the analogous procedure 
for paper shares which serves as a metric for career productivity. 

The organization of this paper is as follows: in Section [Til we review the data analyzed, 
the procedure with which we aggregate the data into publication careers, and the possible 
systematic errors inherent in our method. In Section [TTT1 we analyze the distribution of both 
citation and productivity statistics for three high-impact multidisciplinary journals: Nature, 
the Proceedings of the National Academy of Science (PNAS), and Science, and also for 
three less multidisciplinary journals: CELL, the New England Journal of Medicine (NEJM), 
and Physical Review Letters (PRL). We note that only three of thesejournals analyzed 
are discipline specific, and so we rely significantly on the results of [3, S| in justifying the 
comparison of normalized career metrics across discipline beyond our results for the high- 
impact journals CELL, the New England Journal of Medicine (NEJM), and Physical Review 
Letters (PRL). 

II. DATA AND METHODS 



We downloaded journal data in May 2009 from ISI Web of Knowledge [25J. We restrict 
our analysis to publications termed "Articles," which excludes reviews, letters to editors, 
corrigendum, etc. For each journal, we combine all publications into one database. In total, 
these data represent approximately 350,000 articles and 600,000 scientists (see Tabled]). 

Our data collection procedure begins with downloading all "articles" for each journal for 
year y from ISI Web of Knowledge. From the set of N(y) articles for each particular journal 
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TABLE I: Summary of data set size for each journal. Total number N of unique (but possibly 
degenerate) name identifications. 



Journal 


Years 


Articles 


Authors, N 


CELL 


1974-2008 


53,290 


31,918 


NEJM 


1958-2008 


17,088 


66,834 


Nature 


1958-2008 


65,709 


130,596 


PNAS 


1958-2008 


84,520 


182,761 


PRL 


1958-2008 


85,316 


112,660 


Science 


1958-2008 


48,169 


109,519 



and year, we calculate (c(y)), the average number of citations per article at the date of data 
extraction (May 2009). Each article summary includes a field for a contributing; author's 



name identification, which consists of a last name and first and middle initial |26|. From 
these fields, we aggregate the career works of individual authors within a particular journal, 
n this paper we develop normalized metrics for career success and productivity, while in 



161 ] we compare theory and empirical data for career longevity. 

For each author, we combine all his/her articles in a given journal. Specifically, a pub- 
lication career in this paper refers to the lifetime achievements of a single author within a 
single journal, and not the lifetime achievements combined among the six journals analyzed. 
We define n as the total number of papers for a given author in a given journal over the 
50-year period. In analogy with the traditional citation tally, one can calculate the career 
success/impact within a given journal by adding together the citations q received by the n 
papers, 



(i) 



i=i 



Furthermore, one can calculate the career productivity of a given author within a specific 
journal as the total number P of papers published within the journal. A main point raised 
in this paper is to discount the value of citation metrics which do not take into account the 
time-evolution of citation accumulation. 

Naturally, some older papers will have more citations than younger papers only because 
the older papers have been in circulation for a longer time. In Fig. [T]we plot (c(y)), the aver- 
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age number of citations for articles from a given year, and confirm that the time-dependence 



of citation accumulation is an important factor. Interestingly, it is found in [10| that the 
pdf of citations from papers within a given year and journal is approximately log-normal, 
where the average value of the distribution has a time-dependent drift. With increasing 
time, the pdf approaches a steady state distribution which is also approximately log-normal. 
Hence, the non-monotonicity in (c(y)) suggests that an important factor in the dynamics of 
citation counts is the growth with time of the scientific body and the scientific output. The 
mechanism underlying the evolution of citation trends and impact factors is complex, where 
it is found that citation growth rates decompose into several components in addition to the 
growth of science ll|. Another criticism of Eq. ([T]) is that it does not take into account 
the variability in number of coauthors, which varies both within and across discipline (see 
Figure [3]) . 

To remedy these problems, we propose a simple success metric termed citation shares, 
which normalizes the citations Ci(y) of paper i by (c(y)), the average number of citations for 
papers in a given journal in year y, and divides the quantity Ci(y) / (c(y)) into equally dis- 
tributed shares among the aj coauthors. Dividing the shares equally will obviously discount 
the value of the efforts made by greater contributors while raising the value of the efforts 
made by lesser contributors. Without more accurate reporting schemes on the extent of 
each authors' contributions (as is now implemented in e.g. Nature and PNAS), dividing the 
shares equally is the most reasonable method given the available data. Hence, we calculate 
the normalized career citation shares as 

An analogous estimator for career productivity is P s , the total number of paper shares within 
a given journal, 



1 



a 
1=1 



which partitions the credit for each publication into equal shares among the coauthors. 

There is another sampling bias that we address. Currently, we assume that all careers 
are comparable in their duration, or more precisely, maturity. However, without further 
consideration, this assumption would ensure that we are comparing the careers of graduate 
students with seasoned professors. Hence, we implement a standard method to isolate 
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"completed" careers from our data set which begins at year Yq and ends at year Y 



common method described in 



241 ] . For each author z we calculate (Ar a ), his/her average 
time between successive publications in a particular journal. A career which begins with 
the first recorded publication in year y z>0 and ends with the final recorded publication in 
year y z j is considered "complete" if the following two criteria are met: 

(i) Vzj <Y f - (At z ) and 

(ii) Vz,o > Y + {At z ). 

In other words, this method estimates that the career begins in year y z0 — (Ar z ) and ends 
in year y z j + (Ar z ). If either the beginning or ending year do not lie within the range of the 
data base, then we discount the career as incomplete to first approximation. Statistically, 
this means that there is a significant probability that this author published before Yq or will 
publish after Yf. Using this criterion reduces the size of the data set by approximately 25% 
(compare the raw data set sizes N in Table [I] to the data set sizes N* in Table Hi])- The 
results reported in this paper are, unless otherwise stated, based on the analysis of only the 
subset of "completed" careers. Another justification for this criterion is that we recently 



implement this method in previous work on career longevity in 16| , which is more sensitive 
to using or not using the criterion. 

We note some potential sources of systematic error in the use of this database: 

1. Degenerate names leads to misleading increases in career totals. 

2. Authors using middle initials in some but not all instances of publication decreases 
career totals. 

3. A mid-career change of (last) name decreases career totals. 

4. Sampling bias due to finite time period. Recent young careers are biased toward short 
careers. Long careers located toward the beginning Y{ or end Yf of the database are 
biased toward short careers and hence decrease career totals. 



Radicchi et al. js| observe that the method of concatenated author ID leads to a pdf 
P(d) of degeneracy d that scales as P(d) ~ d~ 3 , which contributes to the systematic error 
mentioned in item 1. Although the size of our data set guarantees almost surely that such 
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errors exist (given the prevalence of last names Wang, Lee, Johnson, etc.), these errors 
should be negligible in the estimation of pdf parameters quantifying a significant portion of 
the data set. 



III. RESULTS 

A. Individual Papers 

The growth dynamics of citations vary, ranging from stunted growth to steady growth 
and, in some cases where research is published ahead of its time, to late blooming growth 
Q]. One objective of this paper is to account for the time-dependence of citation counts in a 
consistent way so that citations can be compared across time. We de-trend citation counts 
to a time-independent framework by dividing the number of citations a paper has received 
by the average number of citations for all papers published in the given journal in the same 
year. In Fig. [T]we plot (c(y)), the average number of citations per paper, where the average 
is performed over the full set of papers in each given journal for each year y. We note that 
(c(y)) approaches zero as the year becomes contemporaneous with the data download date, 
and that the peak value of (c(y)) occurs for papers published approximately 15 — 20 years 
before our data download date in 2009. The presence of this maximum value reflects the 
growth of the scientific body, the growth of scientific productivity, and the time delay over 
which ideas become relevant and established. See Ref. 10| for the average citation profiles 



(c(y)) of 2,266 journals indexed by ISI. Normalizing to this standard baseline allows one to 
compare the success of papers across scientific disciplines, first demonstrated in Ref 8|. 

In order to visualize the effects of normalizing citations to a local average in the case 
of single papers, we compare the un-normalized cumulative distribution functions (cdf) of 
Fig. A) with the normalized cdf of Fig. |5^B). The procedure of normalizing by the local 
average reduces the variations across journal (discipline j^J), revealing a universal scaling 
law P(x > q) ~ q 1 ^ 1 with 7 — 1^2. The scaling exponent 7 ~ 3 describing the success of 
individual papers was first reported in l| , where normalizing techniques were not employed. 
Surprisingly, we observe the same value of 7 here for the normalized citation statistics of 
individual papers from several major journals over a 50-year period. 

In addition to time-dependent factors, we also consider factors resulting from various re- 
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search styles across the broad range of scientific disciplines. In science, the resources required 
to make significant scientific advances range from a pencil and paper to million-dollar labo- 
ratory equipment. Similarly, the number of contributors to scientific advances ranges from 
a single scientist to projects involving several hundred scientists. Fig. [3] illustrates the pdf of 
collaboration size associated with a single publication. The pdf is significantly right-skewed, 
especially for publications in NEJM and PRL where occasionally, the number of authors 
contributing to a single publication exceeds 100 individuals. For instance, in the cases of 
research at major medical institutes and particle accelerators, it is common for the credit for 
the scientific advance reported in a publication to be shared by extremely large numbers of 
contributors. In Eqs. (j2J) and we choose a simple weighting recipe for associating credit 
among a, authors of a single paper i. We assign equal credit for all authors. Although this 
recipe may grant some authors more credit than due, it also credits other authors with less 
credit than due. We believe this weighting scheme is useful in proportionally sharing the 
credit for a scientific advancement among the a% authors. To address this issue, the journals 
Nature and PNAS require the corresponding author to assign credit to each co-author across 
a broad range of categories such as theoretical analysis, experimental methods, and writing 
of the manuscript. If adopted across all journals, this formalism could potentially improve 
the quantitative allocation of scientific credit, thus improving the quantitative measures for 
individual scientific impact. 

B. Citation Shares 

In Figs. H£A) and ID^B) we present the pdfs of career citations C and of career citation 
shares C s , corresponding to Eq. ([1]) and Eq. (j2J), respectively. While the six pdfs of C in 
Fig. H(A) are all similarly right-skewed, the collapse onto a universal function is weak for 
small values of C. 

The discrepancy between the pdf curves for small and large citation counts in Fig. 11(A) 
is likely associated with factors associated with the size of the scientific field, the size of the 
collaboration group and the impact of the research. Since we study only six high-impact 
journals, these factors should be negligible in the overall difference across discipline and 
journal, since we assume both discipline and journal are large. Hence, the collapse of the six 
pdfs for normalized citation shares in Fig.H^B) demonstrates that normalization is necessary. 
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TABLE II: Summary of citation shares for "completed" careers. The reduced size of the data 
set has size A*. The average number of citation shares (C s ) for careers within each journal are 
computed from the subset of "completed" careers (the value in parenthesis corresponds to value for 
all careers). The value of the power-law exponent ft corresponds to the Zipf plot of citation shares 
plotted in Fig. [5j where we calculate the value of ft using data in the range 10 < rank < Nmle 
implementing a linear regression on a log-log scale, where Nmle is the number of data values used 
to calculate a. The value of the power-law exponent a corresponds to the pdf of citation shares 
plotted in Fig. BJB), where we calculate the value of a using Hill's maximum likelihood estimator 
for data values greater than a cutoff Cg = 1. 



Journal 


N* 


{Cs) 


ft 


a 


CELL 


23,060 


0.34 


(0.35) 


0.52 ±0.01 


2.60 ±0.04 


NEJM 


49,341 


0.25 


(0.26) 


0.45 ±0.01 


2.65 ±0.04 


Nature 


94,221 


0.46 


(0.50) 


0.56 ±0.01 


2.42 ±0.02 


PNAS 


118,757 


0.42 


(0.46) 


0.57 ±0.01 


2.50 ±0.02 


PRL 


72,102 


0.61 


(0.75) 


0.55 ±0.01 


2.25 ±0.02 


Science 


82,181 


0.43 


(0.44) 


0.56 ±0.01 


2.43 ±0.02 



For each pdf P{C S ) we observe a scaling regime 

p(C s ) ~ c; a , 



(4) 

the pdf using the maximum likelihood 



and we estimate the scaling exponent a in the tail o 
estimator (MLE), also known as the Hill estimator 27l. 128 

We list a values calculated for C s > C c s = 1 in Table [III The Hill estimator is a robust 
method for approximating power-law exponents which incorporates each data observation 
C\ greater than a cutoff value C c s into the calculation of a, 

N 



a 



1 + 



(5) 



with standard error, 

5a « (a - 1)/VN . (6) 

For each journal, the the number of data points greater than C c s used in the calculation of 
a is approximately 10% of the total data set size A^*. Remarkably, the scaling exponent for 
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TABLE III: The top 20 authors (not necessarily "completed") in the journals CELL, NEJM, and 
PRL, ranked according to citation shares C s accumulated from their n papers published in each 
journal. Our normalization procedure offers one way to quantitatively order the top authors. 

CELL NEJM PRL 



Name 


c 3 


n 


Name 


C s 


n 


Name 


c a 


n 


GREEN, H 


49.7 


35 


BRAUNWALD, E 


30.3 


59 


WEINBERG, S 


313.3 


49 


BALTIMORE, D 


33.8 


64 


KOCHWESER, .1 


23.6 


28 


ANDERSON, PW 


137.4 


64 


MANIATIS, T 


29.5 


55 


MCCORD, JM 


20.2 


1 


WILCZEK, F 


120.0 


62 


SHARP, PA 


25.1 


41 


FINLAND, M 


17.4 


36 


TERSOFF, J 


105.1 


76 


TJIAN, R 


23. S 


45 


HENNEKENS, CH 


16.9 


36 


HALDANE, FDM 


102.3 


38 


LEDER, P 


22.4 


39 


REICHLIN, S 


16.7 


10 


YABLONOVITCH, E 


87.5 


21 


AXEL, R 


20.9 


52 


VECCHIO, TJ 


14.8 


1 


PERDEW, JP 


78.3 


20 


WEINTRAUB, H 


20.5 


46 


STAMPFER, MJ 


14.3 


45 


LEE, PA 


74.6 


76 


KARIN, M 


18.5 


40 


TERASAKI, PI 


13.7 


29 


PENDRY, JB 


74.1 


29 


RUBIN, GM 


18.0 


52 


OSSERMAN, EF 


13.7 


6 


PARRINELLO, M 


72.8 


68 


KOZAK, M 


17.1 


6 


KUNIN, CM 


13.5 


16 


FISHER, ME 


71.6 


67 


ROEDER, RG" 


15.5 


44 


YUSUF, S 


13.4 


18 


CIRAC, JI 


66.7 


97 


RHEINWALD, JG 


14.7 


7 


ROSEN, FS 


13.2 


42 


HALPERIN, BI 


66.7 


50 


EVANS, RM 


14.1 


32 


CHALMERS, TC 


13.1 


30 


RANDALL. L 


63.4 


14 


OFARRELL, PH 


13.9 


14 


AUSTEN, KF 


12.9 


30 


BURKE, K 


63.2 


18 


GLUZMAN, Y 


13.3 


2 


WELLER, TH 


12.7 


7 


JOHN, S 


62.8 


20 


HUNTER, T 


13.2 


27 


GARDNER, FH 


12.6 


19 


GEORGI, H 


61.9 


26 


GOLDSTEIN, JL 


13.0 


36 


DIAMOND, LK 


12.6 


18 


CAR, R 


59.8 


51 


PENMAN, S 


12.9 


30 


FEINSTEIN, AR 


12.2 


16 


GLASHOW, SL 


59.6 


37 


BROWN, MS 


12.8 


35 


MERRILL, .IP 


11.9 


25 


CEPERLEY, DM 


58.9 


39 



citation statistics of completed careers is approximately 2.5 for all journals analyzed. Hence, 
we find convincing evidence for a universal scaling function representing the distribution of 
citation shares for scientific careers in competitive high-impact journals. Interestingly, the 
values of a for each journal are less than the values of 7 ~ 3 which describes the scaling 
of normalized single article citation counts in Fig. [2J This result implies that the success of 
individuals over their entire careers is not related in a simple way to the success of a random 
number of independent articles. Instead, there is a larger number of stellar careers than 
would be expected from the number of stellar papers. 

Another illustrative method for comparing the distribution of success across the entire 
range of individuals is the popular Zipf plot, which is mathematically related to the pdf 
[llS]- In Fig. Owe plot C s versus rank for the same set of completed careers analyzed in 
Fig. S](B). The Zipf plot emphasizes the scaling in the tail of the pdf, which are represented 
by high rank values. We calculate the scaling exponent of the Zipf plot for rank values in 
the range 10 < r < r c for each journal, where r c corresponds to the number of data points 
incorporated into the calculation of a using Hill's MLE. These values are in approximate 
agreement with the expected relationship 1 + 1//3 ~ a. 
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The small range of (3 values across journals (see Table |TT]) demonstrates that our nor- 
malization procedure places scientific accomplishments on a comparable footing across both 
time and discipline. In Table UTTl we list the top 20 publication careers according to citation 
shares. This table consists mostly of careers that have many papers of significant impact; 
however, it also contains a few careers that are distinguished by a small number of seminal 
papers. Hence, while longevity at the upper tier of science is good at assuring reputation 
and success, there are also a few instances of success achieved via a singular yet monumental 
accomplishment . 



C. Paper Shares 



We now focus on scientific productivity, quantified by the number of papers published by 
a given author. In Fig. Owe plot the pdfs for paper shares defined in Eq.([n]). In order to 
collapse the pdfs for the six journals analyzed, we hypothesize that a universal function for 
productivity can be written as P{P S ) = f(P s /{P s ))- In an effort to compare the pdfs across 
discipline, we approximate the generic pdf of paper shares by a log-normal distribution with 
a heavy tail after a cutoff value P s c . Quantitatively, we represent the general form of the pdf 

(IS . 



P(ft)<x' A«p[-(lnP.-„)V2^ 



p < pc 
1 a 1 s 



(7) 



p-a p > pc 

The least-square parameters for the log-normal fit and the MLE parameter for the scaling 
regime are listed in Table IIV1 The log-normal distribution is consistent with the prediction 



by Shockley 



that productivity (as estimated here by paper shares) is a result o 



of multiplicative factors, which can lead to log-normal 29j, stretched exponential 



a series 



30], and 



even power-law 31] distributions, given the appropriate set of systematic conditions. 



D. Matthew Effect 



We conclude this section with quantitative evidence of the "Matthew effect" in the ad- 
vancement of scientific careers. In Fig. [7] we plot the average waiting time between pub- 
lications (t(ti)) for all authors that meet the complete career criterion by averaging the 
difference in publication year for the paper n and the paper n + 1. The values of (t(1)) for 
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TABLE IV: Summary of paper shares for "completed" careers . The value of the log-normal 
fit parameters /i and a correspond to the pdf before the cutoff value of Pg ~ 2 paper shares. 
The values of a are calculated using a data values after the cutoff Pg = 1 paper shares, which 
corresponds to approximately 8% of the total data for each journal. 



Journal 


/" 


a 


a 


CELL 


-1.7±0.1 


0.7 ±0.1 


2.60 ±0.05 


NEJM 


-1.7±0.1 


1.0 ±0.1 


2.60 ±0.02 


Nature 


-1.3 ±0.1 


1.0 ±0.1 


2.74 ±0.05 


PNAS 


-1.6 ±0.1 


0.7 ±0.1 


2.56 ±0.02 


PRL 


-1.1 ±0.1 


1.0 ±0.1 


2.35 ±0.02 


Science 


-1.4 ±0.1 


0.9 ±0.1 


2.61 ±0.02 



each journal are 2.2 (CELL, PRL), 3.0 (Nature, PNAS, Science) and 3.5 (NEJM) years. The 
decrease in waiting time between publications is a signature of the cumulative advantage 

n nn 

mechanism qualitatively described in [19] and quantitatively analyzed in [la. Il8|. To avoid 
presenting statistical fluctuations arising from the small size of data sets, we only present 
(r(n)) computed for data sets exceeding 75 observations. 

To explain the steady decline of the curve for PRL we mention that PRL has many authors 
with many articles (n > 100). A possible explanation is that a significant number of these 
authors are involved in large particle accelerator experiments with multiple collaborating 
groups. These multilateral projects contribute significantly to the heavy tail observed in the 
pdf of the number of authors per paper (Fig. [3]). Hence, the decay in the curve for PRL which 
approaches zero might be due to the project leaders at large experimental institutions which 
produce over many years many significant results per year. Furthermore, the organization 
of the curves in Fig. [7] suggests that it is more difficult at the beginning of a career to 
repeatedly publish in CELL than PRL. Reaching a crossover point along the career ladder is 
a generic phenomenon observed in many professions. Accordingly, surmounting this abstract 
crossover is motivated by significant personal incentives, such as salary increase, job security, 
and managerial responsibility. 



14 



IV. DISCUSSION 



Scientific careers share many qualities with other competitive careers, such as the careers 



of professional sports players, inventors, entertainers, actors, and musicians [15|, |32|, |33 |. 
Limited resources such as employment, salary, creativity, equipment, events, data samples, 
and even individual lifetime contribute to the formation of generic arenas for competition. 
Hence, of interest here is the distribution of success and productivity in high impact journals 
which in principle have high standards of excellence. 

In science, there are unwritten guides to success requiring ingenuity, longevity, and pub- 
lication. We observe a quantifiable statistical regularity describing publication careers of 
individual scientists across both time and discipline. Interestingly, we find that the scaling 
exponent for individual papers (7 ~ 3) is larger than the scaling exponent for total citation 
shares (a ~ 2.5) and the scaling exponent for total paper shares (a ~ 2.6), which indicates 
that there is a higher frequency of stellar careers than stellar papers. This is consistent 
with the observation that a stellar career can result from an arbitrary combination of stellar 
papers and consistent success, as demonstrated in Table IIHI In all, the statistical regularity 
found in the distributions for both citation shares and paper shares lend naturally to meth- 



ods based on extreme statistics in order to distinguishing; stel 



16 



ar careers. Such methods have 



34| , where statistical benchmarks 



been developed for Hall of Fame candidacy in baseball 
are established using the distribution of success. 

Statistical physicists have long been interested in complex interacting systems, and are 
beginning to succeed in describing social dynamics using models that were developed in the 



context of concrete physical systems 



35] . This study is inspired by the long term goal of 



using quantitative methods from statistical physics to answer traditional questions rooted in 
social science 36|, such as the nature of competition, success, productivity, and the universal 
features of human activity. Many studies begin as empirical descriptions, such as the studies 



of common mobility patterns [37j, sexuality [38|, |39|], and financial fluctuations |40(, and lead 
to a better understanding of the underlying mechanics. It is possible that the empirical 
laws reported here will motivate useful descriptive theories of success and productivity in 
competitive environments. 
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FIG. 1: The average number of citations (c(y)) per article for each journal in year y demonstrates 
the time-dependence of citations. This quantity serves as a normalizing factor, so that we can de- 
trend citation values across different years. The popular Impact Factor (IF) [lol.[ll| of a journal for 
a particular year is the average number of citations obtained in a given year for articles published 
over the previous two years. In this paper we restrict our analysis to journals with large IF, ensuring 
that there is considerable competition for limited publication space in such journals. 
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FIG. 2: Data collapse in the distribution of article citations for several journals is achieved by 
accounting for the time dependence of citations, (a) The CDF of raw citations c depends on the 
field, (b) Normalizing the number of citations c by (c(y)), the average number of citations for 
a particular year in a particular journal, the CDF for different journals are remarkably similar, 
with the power-law with 7 — 1 ~ 2. We calculate 7 for each journal using Hill's MLE and obtain 
values for the scaling exponent corresponding to each journal: 7 = 3.64 ±0.12 ( CELL), 3.31 ± 0.07 
(NEJM), 2.87 ±0.03 (Nature), 3.30 ± 0.04 (PNAS), 2.96 ±0.03 (PRL), 2.86 ± 0.03 (Science). We 
provide a power law (solid line) with exponent 7 — 1 = 2 for reference. 




Number of authors per paper, a 

FIG. 3: The citations and credit for a publication are typically shared fully among all a coauthors, 
unless the journal specifically allows for designation of specific credit. The pdf of a for each journal 
demonstrates that the credit for a single publication can be distributed across a very broad number 
of contributors. In this paper, we propose normalizing credit into fractional "shares", to account 
for the variations in collaboration size. The average number of authors contributing to an article 
in each journal are (a) = 4.8 (CELL), 5.9 (NEJM), 3.5 (Nature), 4.6 (PNAS), 9.1 (PRL), 3.7 
(Science). 
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FIG. 4: We estimate the career success of a scientist within a given journal using the citation 
shares metric C s defined in Eq. (2), which accounts for both the number of authors and the age 
of the paper, (a) PDF of total raw citations C according to Eq. ([T]) for "completed" careers, (b) 
PDF of total citation shares C s according to Eq. ([2]) for "completed" careers. A given career is 
considered "complete" if there is a large likelihood that the data set contains all of the particular 
author's publications. The normalization procedure results in significant data collapse in panel 
(b), with the value of the scaling exponent a ~ 2.5 for all journals analyzed. 
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Rank, r 

FIG. 5: The Zipf plot emphasizes the stellar careers corresponding to large C s , the total number 
of citation shares within a particular journal defined in Eq. (2), and shows a significant scaling 
regime corresponding to the top-ranking "champions" of each journal. For comparison, we list the 
top 20 careers within the journals CELL, NEJM and PRL in Table Unl The total number of career 
citation shares for a particular author in a given journal serves as a proxy for the career success 
of the scientist. The statistical regularity in the rank ordering of scientific achievement extends 
over four orders of magnitude. The similarity in scaling exponent among the journals analyzed 
possibly suggests that there are fundamental forces governing success in competitive arenas such 
as high-impact journals. For visual clarity, we plot the power law with scaling exponent /3 = 0.5. 
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FIG. 6: As a proxy for career productivity, we define paper shares P s in Eq.([3]), which accounts 
for variations in the size of the collaboration. In order to collapse the pdfs of total paper shares 
for completed careers within the six journals analyzed, we hypothesize that the universal scaling 
function quantifying productivity can be written as P(P S ) = f(P a /(Ps))- We approximate the 
generic pdf of paper shares by a log-normal distribution with a power-law tail after a cutoff value 
Pg ~ 1. We list the values of the log- normal parameters \x and u, and the scaling parameter a for 
each journal in Table HV1 (Inset) We plot the pdf for CELL and PNAS data on log-linear axes for 
P s < 3 in order to demonstrate the log-normal form consistent with the prediction by W. Shockley 
23J that productivity can be modelled as a series of random multiplicative factors. 
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Author' s n th paper 



FIG. 7: A decreasing waiting time r(n) between publications in a given journal suggests that a 
longer publication career (larger n) facilitates future publications, as predicted by the Matthew 
effect. We plot (r(n))/(r(l)), the average waiting time (r(n)) between paper n and paper n + 1, 
rescaled by the average waiting time between the first and second publication, (t(1)) . The values of 
(r(l)) are 2.2 ( CELL, PRL), 3.0 (Nature, PNAS, Science) and 3.5 (NEJM) years. Physical Review 
Letters exhibits a more rapid decline in r(n), reflecting the rapidity of successive publications 
(often by large high-energy experiment collaborations), which is possible in this high-impact letters 
journal. 
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